Multiple Logistic Regressio
نویسندگان
چکیده
A common problem in software cost estimation is the manipulation of incomplete or missing data in databases used for the development of prediction models. In such cases, the most popular and simple method of handling missing data is to ignore either the projects or the attributes with missing observations. This technique causes the loss of valuable information and therefore may lead to inaccurate cost estimation models. On the other hand, there are various imputation methods used to estimate the missing values in a data set. These methods are applied mainly on numerical data and produce continuous estimates. However, it is well known that the majority of the cost data sets contain software projects with mostly categorical attributes with many missing values. It is therefore reasonable to use some estimating method producing categorical rather than continuous values. The purpose of this paper is to investigate the possibility of using such a method for estimating categorical missing values in software cost databases. Specifically, the method known as Multinomial Logistic Regression (MLR) is suggested for imputation and is applied on projects of the ISBSG multiorganizational software database. Comparisons of MLR with other missing data techniques, such as listwise deletion (LD), mean imputation (MI), expectation maximization (EM) and regression imputation (RI) show that the proposed method is efficient, especially when the percentage of missing values is high.
منابع مشابه
Diversity in Board Ethnicity and Firm Performance: an Empirical Investigation of Selected Quoted Firms in Nigeria Omoye
This paper examines Nigeria. The ratio of the three major tribes to t board ethnic diversity. the Nigerian Stock Exchange and a cross sectional OLS regressio collected from annual reports and through board members background profile searching. The argument that there is a possibility of improving balancing. The main finding board of quoted companies in Nige Allowing for interaction among the et...
متن کاملIn Silico Multivariate Regressio n Analysis and Validation Studies on Selective MMP-13 Inhibitors
QSAR(Quantitative Structure Activity Relationship) studies were carried out on a set of 72 α-sulfone hydroxamatesas Matrix Metalloproteinase-13 (MMP-13) inhibitors using multiple regression procedure. Outliers were removed based on Relative Error calculation and Extent of Extrapolation. The activity contributions of these compounds were determined from regression equation and the validation pro...
متن کاملData analysis methods for cellular network performance optimization
Modern cellular networks including GSM/GPRS and UMTS networks offer faster and more versatile communication services for the network subscribers. As a result, it becomes more and more challenging for the cellular network operators to enhance the usage of available radio resources in order to meet the expectations of the customers. Cellular networks collect vast amounts of measurement informatio...
متن کاملMulti-criteria Logistic Hub Location by Network Segmentation under Criteria Weights Uncertainty (RESEARCH NOTE)
Third party service providers are locating logistic hub for operating their tasks. Finding a proper location helps them to have better performance in competitive environment. Multiple characteristics of proper location selection faces the decision maker to have a multi criteria decision making problem. Since the location decision is a long term planning, the robustness of the decision is gettin...
متن کاملFUZZY LOGISTIC REGRESSION BASED ON LEAST SQUARE APPROACH AND TRAPEZOIDAL MEMBERSHIP FUNCTION
Logistic regression is a non-linear modification of the linearregression. The purpose of the logistic regression analysis is tomeasure the effects of multiple explanatory variables which can becontinuous and response variable is categorical. In real life there aresituations which we deal with information that is vague innature and there are cases that are not explainedprecisely. In this regard,...
متن کامل